[00:00.580 --> 00:05.300]  Hello DEF CON and welcome to my talk here at the Recon Village for DEF CON 28
[00:05.300 --> 00:11.400]  Safe Mode. Super excited to be here. This is my first time ever as part of the DEF CON
[00:11.400 --> 00:17.640]  speaker community. I'm super excited. So I hope you're excited also while we start talking about
[00:17.640 --> 00:25.880]  Ambly, the smart darknet spider. So who am I that's talking to you about this really cool project?
[00:25.880 --> 00:32.080]  I am known by a few names online. This may or may not be all of them, but it certainly is the
[00:32.080 --> 00:39.400]  few that are most common. Some people will know me as Cytissus Eurydice. Others may be CyberSci.
[00:39.580 --> 00:44.200]  Most people in this community are going to know me as Levitanin or Levi.
[00:44.240 --> 00:48.480]  This is what I am online where I'm actually interacting with people most often.
[00:49.180 --> 00:54.400]  By day I am a cybersecurity incident response professional and by night I'm a darknet
[00:54.400 --> 00:59.940]  researcher by choice and trade. I am a self-proclaimed master of spiders also, which
[00:59.940 --> 01:06.540]  may or may not be both on the computer and off of the computer. Just keep in mind that spiders in
[01:06.540 --> 01:11.700]  real life out in the wild, they're doing a lot of good stuff for us. But those online, they're
[01:11.700 --> 01:17.180]  equally hiding in plain sight doing a lot of cool things that we need them to do every day.
[01:17.340 --> 01:23.300]  And I'm here hoping to create tools that are based off of these spiders and build upon them
[01:23.300 --> 01:27.660]  in order to help us in fields like open source intelligence, threat gathering,
[01:27.660 --> 01:32.360]  and other research areas. But before we get ahead of ourselves, let's talk a little bit
[01:32.360 --> 01:38.860]  about TOR because we're going to be talking TOR a lot today. Keep in mind though that TOR is not
[01:38.860 --> 01:44.680]  the only darknet access point, so we're going to be mentioning a few others as well. Before we get
[01:44.680 --> 01:49.900]  too too far ahead of ourselves, because I could talk TOR, darknet, and spiders all day long,
[01:49.900 --> 01:54.580]  let's talk a little bit about the different layers of this presentation. We're going to
[01:54.580 --> 01:59.960]  be going over open source intelligence, or OSINT. We're going to talk about cyber threat intelligence,
[02:00.320 --> 02:05.840]  which you'll often see abbreviated as CTI. We'll talk about the different layers of the internet,
[02:05.840 --> 02:10.840]  which I've already slightly touched on. We'll also talk about the difficulties of finding cyber
[02:10.840 --> 02:16.020]  threat intelligence on the darknet specifically. And finally, this is going to lead us into talking
[02:16.020 --> 02:21.960]  about Ambly, a smart darknet spider specifically designed for cyber threat intelligence.
[02:22.160 --> 02:28.220]  So let's get started. What's open source intelligence? Most people in this village
[02:28.220 --> 02:33.700]  specifically probably have some sort of idea about what open source intelligence is, but we're going
[02:33.700 --> 02:39.440]  to break it down a little bit anyway. Open source intelligence is anything that is accessible from
[02:39.440 --> 02:45.760]  original sources, broken down as accessible original information or data. You can also
[02:46.340 --> 02:52.040]  explain this further by saying something that is posted, viewed, or interacted with by a user,
[02:52.040 --> 02:58.760]  which is accessible publicly online or offline. It doesn't have to be specifically on the internet.
[02:59.180 --> 03:04.360]  This includes, but it's not limited to, the internet at the clear, deep, and dark levels,
[03:04.360 --> 03:11.780]  which we'll get into later. Mass media, television, radio, books, journals, print of all kinds,
[03:11.780 --> 03:18.400]  video games, specialized journals, conference materials, and think tank studies. These are out
[03:18.400 --> 03:22.320]  there. People are talking about them all the time. We're going to be posting about DEF CON for months
[03:22.320 --> 03:28.040]  after this, right? Then there's photos and videos, not just on YouTube or Snapchat, but in general
[03:28.040 --> 03:35.000]  that are posted publicly online. Finally, another area is geospatial information, such as
[03:35.000 --> 03:42.080]  where they actually are located, their GPS, where on a map they may be, or even their IP addresses.
[03:43.860 --> 03:50.880]  So, what is OSINT good for? Absolutely everything. And if you thought about that with a tune in your
[03:50.880 --> 03:56.620]  head, you thought right, because I was talking about that song that's now stuck in your head.
[03:56.620 --> 04:02.860]  You're welcome for the earworm. So, OSINT can be utilized in an array of situations, including,
[04:02.860 --> 04:09.180]  but again, never limited to, designing internal training for a company, understanding your threat
[04:09.180 --> 04:16.160]  profile, beating Keith and Joe at an open source CTF that may or may not be hosted by, you know,
[04:16.160 --> 04:21.870]  Trace Labs today, later on after this presentation. Maybe, maybe not. I don't know. Volunteering,
[04:22.520 --> 04:26.380]  finding out the title of your favorite book from when you were nine, but you forgot existed for
[04:26.380 --> 04:32.160]  about 10 years and you desperately wanted to read again sometime later. Yeah, all of these are
[04:32.160 --> 04:40.500]  actual areas that I have used OSINT for on the clear net and down into the dark net. And that
[04:40.500 --> 04:45.540]  book, for anyone who was interested, was called Goddess of Yesterday, which I highly recommend.
[04:45.760 --> 04:53.320]  It's a good book. But, now you may be asking, if you're not used to this field just yet,
[04:53.320 --> 05:01.620]  is open source intelligent gathering legal? Short answer is yes. Long answer is depends,
[05:01.620 --> 05:09.260]  which is a longer word than yes. The longest answer is, if you're in a country outside the
[05:09.260 --> 05:13.000]  United States, I don't know. It's going to depend. You're going to have to look into your own local
[05:13.000 --> 05:18.860]  laws. Even in the United States, you're going to have to look into your laws per capita, per state.
[05:19.420 --> 05:24.980]  This may be different in some areas. In Europe, I'm expecting that there's probably a lot stricter
[05:24.980 --> 05:29.500]  regulations on what is or is not open source intelligence than there may be here in the
[05:29.500 --> 05:36.280]  United States. That being said, in the U.S., there's a public law that open source intelligence
[05:36.280 --> 05:42.720]  is produced from publicly available information, collected, analyzed, and disseminated in a timely
[05:42.720 --> 05:49.280]  manner to an appropriate audience, and addresses a specific intelligence requirement. This is what's
[05:49.280 --> 05:54.660]  publicly posted on the CIA's website regarding open source intelligence. Now, there are a few
[05:54.660 --> 06:02.340]  other sources that you can look into, including the public law number 108-458, posted in December
[06:02.980 --> 06:14.160]  2004. We also have the FOIA B3 exceptions from the 50 U.S.C. 403-1, which is on intelligence sources
[06:14.160 --> 06:22.860]  and methods. And finally, ATP 2-22.9, which is the Establishment of Open Source Intelligence for the
[06:22.860 --> 06:28.380]  Army. These are all laws that are out there that you can read up on, and they will help you
[06:28.380 --> 06:34.680]  understand what is and is not legal in the United States. Again, certain states may vary.
[06:36.060 --> 06:41.120]  All right, we're going on to the next section now. We're going to start diving into cyber threat
[06:41.120 --> 06:46.420]  intelligence. But first, let's take a step back, let's relax a second, and let's think through a
[06:46.420 --> 06:53.680]  scenario that's going to come up again and again to this presentation. One, let's go. During a
[06:53.680 --> 06:59.380]  pandemic, you're working from home. Everyone's working from home, we hope. Stay safe, right?
[06:59.600 --> 07:04.440]  Your company, which is global, just announced it's working on processing pandemic data to help
[07:04.440 --> 07:11.660]  drive towards a cure. Great, that's awesome work. But does that open you up to some threat that may
[07:11.660 --> 07:18.780]  be out there, some cyber threat actors? As part of the cybersecurity team, your job and your goal
[07:18.780 --> 07:24.260]  is to identify if there are any actors out there that you need to be aware of, and anyone who may
[07:24.260 --> 07:30.100]  be actively targeting not only your company, but companies in your industry. So what do you do?
[07:30.100 --> 07:34.240]  How do you find that out? Keep this in mind, we're going to come back to this.
[07:35.860 --> 07:40.720]  All right, what's cyber threat intelligence? I just kind of threw a scenario at you, but what
[07:40.720 --> 07:45.900]  are we even talking about? Well, cyber threat intelligence is the collection and analysis of
[07:45.900 --> 07:51.160]  threat actors' motives, targets, and attack behaviors in the realm of cybersecurity.
[07:51.820 --> 07:57.340]  Often automatic machine learning and techniques implemented for data collection and processing.
[07:57.480 --> 08:03.280]  This is how we are gathering this information. However, you can do it manually. This is helping
[08:03.280 --> 08:10.120]  to shed some light on preemptive actions, or for preemptive actions, and reveals adversarial
[08:10.120 --> 08:14.960]  motives and tactics, techniques, and procedures, or TTPs, which we'll be talking about a lot.
[08:15.900 --> 08:23.220]  The reason we do this is to help in three areas within a company or possibly even the government.
[08:23.280 --> 08:27.920]  And what these areas are tactical, which is where you're performing a malware analysis and
[08:27.920 --> 08:33.120]  enrichment, you're collecting threat indicators, and you're trying to help out with defensive
[08:33.120 --> 08:40.620]  cyber teams. The goal here is to be able to talk to a semi or more technical audience.
[08:40.760 --> 08:45.760]  Then you have the operational team. You have a team that's trying to understand adversarial
[08:45.760 --> 08:52.800]  capabilities, infrastructure, TTPs. You want to leverage this information to conduct targeted
[08:52.800 --> 08:59.620]  prioritization operations. This team is probably more technical with the details and they want to
[08:59.620 --> 09:05.120]  know about attacks and campaigns from the past, the present, and possibly, if you can anticipate it,
[09:05.120 --> 09:12.380]  the future. Finally, we're talking to the strategic team. This is the team that needs to know all of
[09:12.380 --> 09:17.000]  this information at high level. They want to know about adversarial motives. They want to know who's
[09:17.000 --> 09:22.100]  targeting your industry. They want to be able to leverage this information to engage in strategic
[09:22.100 --> 09:27.880]  security. This is for a non-technical audience in most cases, but it's really important that
[09:27.880 --> 09:32.480]  we make sure that they understand what's going on and why it's important to focus in on this
[09:32.480 --> 09:38.600]  information. Now, are there companies out there that are focused on this? Absolutely. A few of the
[09:38.600 --> 09:46.620]  top ones are Recorded Future, CrowdStrike, FireEye, Mandiant, and SANS. And this is an area that more
[09:46.620 --> 09:51.820]  and more people are getting involved in, especially right now and in this scenario. When there's a
[09:51.820 --> 09:56.840]  pandemic or there's a work-from-home situation, you have to be aware of your cybersecurity.
[09:56.840 --> 10:01.040]  So, the more people who are looking at this, the more we need to know about it and know how we can
[10:01.040 --> 10:08.180]  dig into it. So, what are some tools that we can use right now if we wanted to get into open-source
[10:08.180 --> 10:15.720]  intelligence and cyber threat intelligence? Well, we've got Maltego, Spiderfoot, MalShare,
[10:15.720 --> 10:21.640]  the Open Source Framework, which everyone should start at at some point. It's right there and it
[10:21.640 --> 10:27.880]  will lead you through the tree path to whatever you may need. We have the Check Usernames or
[10:27.880 --> 10:33.700]  Been Verified, which is a great tool to see if a username is used across platforms. You can use it
[10:33.700 --> 10:39.600]  to track people's movements if you need to find a target or if you're working on something like a
[10:39.600 --> 10:45.240]  CTF or Trace Labs. Or you can even use it for yourself to see if the username you want that
[10:45.240 --> 10:50.560]  you want to kind of trademark as your official account or handle is used anywhere that you may
[10:50.560 --> 10:56.540]  need to grab. Have I Been Pwned? Excellent resource. See if you or anyone else has been
[10:56.540 --> 11:02.060]  pwned, if their email addresses have shown up in any leaks. We've got Census, Shodan, which is one
[11:02.060 --> 11:09.080]  of my favorite sites, built with Google Dorks. If you know how to Google Dork, you probably are a
[11:09.080 --> 11:15.480]  leg ahead of a lot of people. Google Dorking, or which is Google Hacking, or just knowing the syntax
[11:15.480 --> 11:22.360]  of how to do a proper search with Google, is an amazing, amazing ability. And it's magic. I don't
[11:22.360 --> 11:29.420]  really know any way else to say it. It's magic. We have Recon NG, The Harvester, Nmap, Creepy, which
[11:29.420 --> 11:34.560]  is a little creepy, but you know, it knows it's branded well. There's so many tools out there.
[11:34.780 --> 11:40.340]  TraceLab has a VM full of them. OSINT Combine is another course where you can learn stuff about
[11:40.340 --> 11:45.200]  OSINT, and they have a lot of tools they talk about. Tools are popping up over and over and
[11:45.200 --> 11:50.500]  over again, all over the place, and most of them are open source. Some of them may have paid features
[11:50.500 --> 11:56.340]  that you can or cannot get if you want, but you can do a lot of this stuff without paying a dime,
[11:56.340 --> 12:02.880]  and they're very useful. So I highly recommend digging into these tools. And if we want to talk
[12:02.880 --> 12:07.240]  about these at all, please let me know. I'd be happy to chat with anyone about any of these.
[12:07.540 --> 12:13.680]  But for right now, we're going to move on. We're getting to one of my favorite parts. I could talk
[12:13.680 --> 12:20.460]  about this way too long, so I'm going to limit myself to three slides. Four slides, I lied. Four
[12:20.460 --> 12:26.180]  slides. We're going to talk about the layers of the internet. Number one, out of the box. Please,
[12:26.180 --> 12:33.600]  please, do not believe everything you see in infographs. Because honestly, I don't know how
[12:33.600 --> 12:41.380]  many times I see this whole igloo... is that the word? Ice cube? Iceberg that we want, that we see
[12:41.380 --> 12:48.060]  for the internet, for these three layers, or in some cases up to 12 layers I've seen. But I don't
[12:48.060 --> 12:53.520]  necessarily agree with this one based on my interaction with the layers of the internet,
[12:53.520 --> 13:00.340]  my understanding of it. But even more so than that, just like you can kind of do with stats,
[13:00.340 --> 13:06.000]  you can do almost anything with an infograph and be convincing. You can really get people's
[13:06.000 --> 13:11.400]  attention. So just keep that in mind. I'll get off my horse about that for right now and just
[13:11.400 --> 13:15.980]  let you know that we're going to be talking about the clear net, the deep web, and the dark net.
[13:18.060 --> 13:22.760]  Clear net. Everybody's favorite layer of the internet, even if you don't know it.
[13:23.480 --> 13:29.520]  The clear net is... oh, that might not be true. We'll see. The clear net is where you can access
[13:29.520 --> 13:35.520]  things like Google and Bing and DuckDuckGo. And if you look for something on Google and you can
[13:35.520 --> 13:43.020]  find the page, you're on the clear net. Otherwise, if you can find the front page but you can't find
[13:43.020 --> 13:49.080]  specific data, let's say you joined a forum and you're trying to figure out what's going on with
[13:49.080 --> 13:54.480]  one of the users, you might be able to search that user and see that that forum has a post,
[13:54.480 --> 13:59.960]  but probably not going to be able to see that post. Not without getting through an access point.
[14:00.020 --> 14:05.860]  And that brings you into the deep web. The deep web is anywhere where you're talking to people
[14:05.860 --> 14:10.660]  online. You're on Discord, you're at DEF CON, you're probably on Discord. By the way, DEF CON,
[14:10.660 --> 14:18.320]  you've gone to the DEF CON forums? Deep web. Like to play video games? Twitch or Steam? Deep web.
[14:18.320 --> 14:25.960]  How about ever log into a university or school portal? Ever. Deep web. You have a bank account
[14:25.960 --> 14:34.400]  anywhere? Deep web. All of these areas also have a front-facing website that is accessible via the
[14:34.400 --> 14:42.320]  Google, Bing, DuckDuckGo, whatever your preferred search engine. But because you have to have an
[14:42.320 --> 14:48.540]  access point, you have to go into this, you're running into this area where it actually is the
[14:48.540 --> 14:56.120]  deep web. It's not as accessible, it's not as indexable for those spiders and crawlers behind
[14:56.120 --> 15:01.020]  Google and other search engines. Which is what makes it the deep web. Which if you go back to
[15:01.020 --> 15:09.480]  that infograph, this should be the largest section of the iceberg. Which leaves us with
[15:09.480 --> 15:15.560]  the darknet. My favorite, personally. There's a few different ways of getting onto the darknet,
[15:15.560 --> 15:21.340]  but the top three access points are those we see here. We've got the Tor Project, which is the
[15:21.340 --> 15:28.060]  Onion Router Network, or just Tor, like we said, we're going to talk Tor today. And the focus is
[15:28.060 --> 15:33.780]  going to be on Tor from here on out for right now. But keep in mind that we also have the Freenet.
[15:33.780 --> 15:41.080]  We have I2P and the Bulletin Board System. All of these require specialized access and or knowledge
[15:41.080 --> 15:47.820]  in order to get onto and interact with users. For Tor, you need to get the Tor Browser to access
[15:47.820 --> 15:55.580]  the Onion Routing System. For Freenet, you need another specialized software. You can also get
[15:55.580 --> 16:02.180]  specialized software for I2P, which will run your traffic through different tunnels via peer systems.
[16:02.180 --> 16:09.360]  And BBSs require telnet client software. All of these require extra software and extra knowledge
[16:09.360 --> 16:16.880]  to even find these websites in most cases, or chat rooms, or what have you. And that's what makes this
[16:16.880 --> 16:27.060]  the Darknet. It's not easy to explore, to travel. If you're trying to get from one site to the other,
[16:27.060 --> 16:33.920]  you need to know someone or something to get there. So, what does that mean for us? What is cyber
[16:33.920 --> 16:39.540]  threat intelligence on the Darknet? Like I mentioned, we're going to focus on Tor. But first, let's go
[16:39.540 --> 16:45.580]  back and look at this. How long does it take to find a website for us to use? If you're on the
[16:45.580 --> 16:53.740]  ClearNet, a website that's newly posted takes about four weeks if you get the SOEs okay. If you get the
[16:53.740 --> 16:58.840]  SOEs set up really well for that website, it can take about four days. And then you'll start seeing
[16:58.840 --> 17:06.080]  it populated on Google or other search engines, depending on how it's set up. So, if I actually go
[17:06.080 --> 17:12.640]  and I log in and I try to find a website on Google, and it's not a hurricane, like it may or may not be
[17:12.640 --> 17:18.720]  outside, then what are we dealing with? A few seconds, maybe a minute. It's kind of based on
[17:18.720 --> 17:24.180]  your internet speed at that point, right? All right, what about the DeepWeb? Well, let's say the DeepWeb
[17:24.180 --> 17:33.140]  website has a ClearNet facing access point, right? Maybe you're trying to find a certain forum post.
[17:33.140 --> 17:39.520]  So, the forum you can find on Google. So, there's the few seconds to a minute. Then you have to
[17:39.520 --> 17:44.420]  either make an account or log in. So, that's the next time. Let's say that takes two minutes. So,
[17:44.420 --> 17:49.000]  that's three minutes now. Then you have to log in. So, once you're logged in, you then have to find
[17:49.000 --> 17:54.460]  the post you want. Now, a lot of these sites have internal indexing, which means you can search
[17:54.460 --> 18:00.420]  stuff internally. And that may take another minute or so. So, that's four minutes, five minutes.
[18:00.680 --> 18:04.340]  That's not terrible, so long as you know what you're looking for. If you don't know what you're
[18:04.340 --> 18:09.540]  looking for, it may take longer. If you're specifically trying to find information on a user
[18:09.540 --> 18:14.880]  or a subject, you may have to do a lot of digging before you come across what you really want.
[18:15.000 --> 18:21.720]  And in those cases, it could be extended for a very long amount of time. That's fine. What about
[18:21.720 --> 18:27.280]  the DarkNet? How are you going to find a DarkNet website? Some of them are posted on Reddit.
[18:27.740 --> 18:32.580]  If you look up DarkNet websites, you're usually going to get Tor, and they're usually on Google.
[18:34.140 --> 18:41.640]  Pastebin, GitHub, some of them are on GitHub, yeah. Well, even if you find it, what do you do
[18:41.640 --> 18:46.720]  with that? It can take some time to find websites. Are they really what you want? What are you
[18:46.720 --> 18:53.720]  looking for? And what about how do you access them? And how long does it take to access them
[18:53.720 --> 19:00.280]  as compared to the ClearNet? Well, let's take a look at our scenario from earlier, and we can
[19:00.280 --> 19:07.520]  try to figure this out. So, some extra details. You've uncovered information about a group
[19:07.520 --> 19:12.860]  targeting your company, or in companies like yours. The information you have indicates that
[19:12.860 --> 19:18.420]  this group is active on Tor, but not where they're active. How do we go about finding this group's
[19:18.420 --> 19:25.240]  activity? Let's walk through it. Manual investigation. Step one. Let's figure out
[19:25.240 --> 19:30.400]  what's going on by using Google as an ally. And I'm going to start with Google, because most of us
[19:30.400 --> 19:37.240]  do. Personally, it's not my preferred search engine, but it is one of the better ones. Oops.
[19:38.580 --> 19:47.240]  So, what do we have? I just searched here for Tor websites. I got a few of them. Here's
[19:47.340 --> 19:53.160]  a quick snippet of them. We've got nine best onion sites to visit. Awesome. List of some sites
[19:53.160 --> 20:00.400]  from Wikipedia. Cool. Very nice. We have best onion sites and how to access them safely.
[20:00.460 --> 20:08.120]  Awesome. How to find active .onions darknet sites and why... And finally, we have the best
[20:08.120 --> 20:16.220]  darknet websites you won't find on Google. Hmm. I don't know about you, but that one seems a little
[20:16.220 --> 20:22.960]  weird, considering we found the link on Google. Personally, I think it's funny. So, I'm going to
[20:22.960 --> 20:29.460]  pick that one. So, that leads us to step two. Pick a rabbit hole to dive down, which is what
[20:29.460 --> 20:34.920]  we're going to do with the best darknet websites you won't find on Google, as found on Google.
[20:35.620 --> 20:42.600]  From there, we're going to go into what does this article or this area have. So, the first one they
[20:42.600 --> 20:49.860]  listed was the hidden wiki. Surprise, surprise. There are a lot of hidden wikis. Some of them
[20:49.860 --> 20:54.260]  are mirrors. Some of them are just lists. Some of them are kept up to date, and some aren't.
[20:54.260 --> 20:58.840]  This one is the general normal access point, but there are others, and there are some that are
[20:58.840 --> 21:05.000]  better. They also have here hidden answers, which is another good one, and generally anonymous.
[21:06.520 --> 21:14.900]  But one big thing I want to note here. If you look at these two, and I've got my mouse over here just
[21:14.900 --> 21:20.120]  to show you these two URLs. First off, we're on tour, so all the URLs are going to end with dot
[21:20.120 --> 21:27.100]  onion. But you'll also see that this one is very short. It's a bunch of gobbledygook. You're not
[21:27.100 --> 21:33.780]  going to be able to understand what it is in most cases, but it's short. And that's fine and dandy,
[21:33.780 --> 21:40.120]  but that's a v2 URL. What's going on right now is that we're switching over to v3 URLs, which are
[21:40.120 --> 21:47.700]  these longer strings, usually around, I think, 52 characters. Sometimes you'll be able to see a tag
[21:47.700 --> 21:54.100]  like this, where it's got an actual human readable English readable word in the beginning. But
[21:54.100 --> 21:59.380]  generally speaking, we're just looking for these longer URLs. Those are the ones that are going to
[21:59.380 --> 22:07.620]  be sticking around for a while. This shorter one is going away in 2021. So yay, we found this,
[22:07.620 --> 22:13.240]  but what does that mean? What are we going to do about that? And how does that, how do we get an
[22:13.240 --> 22:20.740]  updated one if this is a good starting point? We're going to talk about that too. So, step four. We
[22:20.740 --> 22:26.020]  found the URLs, but how do we actually access them? If you go and plug and play those dot onion links
[22:26.020 --> 22:31.520]  into your Google Chrome or Firefox or any normal standard browser, you're not going to be able to
[22:31.520 --> 22:38.600]  reach it. You need to be part of the Tor network or the onion routing system. In order to do that,
[22:38.600 --> 22:46.180]  you need either to set up the proxy or you download the Tor browser. So this is Tor. This
[22:46.180 --> 22:52.980]  is the base browser that you have right now. It's updated to 9.5.3. It's based off of Mozilla
[22:52.980 --> 22:58.860]  Firefox and it allows the user to connect to the Tor network and onion router system. Now, you can
[22:58.860 --> 23:05.720]  change a few settings in here. Generally speaking, it's not recommended to have JavaScript on. For
[23:05.720 --> 23:11.700]  example, you want to try to be in the safest mode you can be when going through different sites.
[23:12.180 --> 23:17.380]  And you start off with the search engine DuckDuckGo. Now, DuckDuckGo is a clear net search
[23:17.380 --> 23:22.740]  engine. It doesn't necessarily help you with Tor websites, though it can give you some results
[23:22.740 --> 23:29.440]  similar to if you look up on Google. But now we can quickly and easily pivot into using
[23:30.660 --> 23:34.380]  those URLs that we found earlier. So let's go to the hidden wiki.
[23:34.900 --> 23:39.100]  Here's the first thing that popped up when I went on the hidden wiki about two days ago.
[23:39.300 --> 23:47.480]  Top of the page, hidden wiki, new URL as of 2019-2020. Add this to the bookmark and spread it.
[23:47.480 --> 23:52.540]  This is the current URL for the wiki. Now, can you get there from the small one? Yes,
[23:52.540 --> 24:00.100]  which you can see right here at the top. I was accessing the small v2 URL. This is the v3 URL
[24:00.100 --> 24:05.360]  and it's going to help us stay connected to the wiki as long as it's viable and up. So when you
[24:05.360 --> 24:12.720]  do come across these v3 URLs, you want to grab them or you may lose access to the site and beyond
[24:12.720 --> 24:21.380]  normal losing of access on darknet. So we're on the wiki. We're at step five. Visit the starter
[24:21.380 --> 24:30.260]  point. What are we doing on this website? Well, first we're trying to find information about the
[24:30.260 --> 24:37.660]  threat actors for our cyber threat intelligence scenario. So what areas of the wiki can help us?
[24:37.660 --> 24:44.320]  Well, we've got social networks, which, slight side note, this includes Facebook. And for the
[24:44.320 --> 24:50.600]  life of me, I cannot understand how a dot onion site for Facebook is not going against their own
[24:51.100 --> 24:56.740]  terms and policies. But also, if anybody could explain that to me, please, by all means,
[24:56.740 --> 25:02.340]  I would love to hear it. But I also personally wouldn't recommend signing into any of your real
[25:02.340 --> 25:09.920]  life social media accounts, especially at the same time as doing other things on the darknet.
[25:09.920 --> 25:15.760]  The whole point of this is to stay private, to try to stay anonymous. And if you're logging into
[25:15.760 --> 25:21.540]  your personal accounts and random accounts, or your personal accounts and your darknet accounts,
[25:21.920 --> 25:26.700]  it's possible that you could link those two. We want to try to avoid that. Similarly,
[25:26.700 --> 25:30.580]  if you're doing OSINT online and you're not using the darknet, but you're using maybe a
[25:30.580 --> 25:37.480]  virtual machine, you don't want to log into your own stuff in that instance. You could run into a
[25:37.480 --> 25:44.480]  conflict. Anyway, we have social media. We've got Connect, we've got Galaxy 3, which is rather new,
[25:44.480 --> 25:50.300]  we have Torbook, and we have Facebook somehow. Facebook I wouldn't necessarily
[25:50.620 --> 25:55.740]  consider a darknet social media, but it's there. Then we've got Hack, Freak, Anarchy,
[25:55.740 --> 26:02.600]  Warriors, Viruses, and Crack. We've got all of the stuff you need. Just try to rent a hacker, why don't you?
[26:02.600 --> 26:07.740]  Definitely isn't a scam, definitely is not going to backfire on you at all, promise.
[26:10.000 --> 26:16.560]  Do we think the threat actors that we are looking at for the industry, for our cyber
[26:16.560 --> 26:23.680]  threat intelligence scenario, are really taking any action here on these websites? Are they
[26:23.680 --> 26:33.040]  maybe selling their abilities or any code that they've made online? Maybe. Personally,
[26:33.040 --> 26:42.420]  I wouldn't think they'd be on these sites, but it's possible. You can definitely look into them,
[26:42.420 --> 26:49.700]  all depends on what you're trying to get into. Now, there's also another area, which is the
[26:49.700 --> 26:55.800]  introduction points. Earlier, I mentioned the search engines. Those are Google, DuckDuckGo,
[26:55.800 --> 27:01.540]  Bing, clear net search engines, awesome resources for clear net, for open source, all that good
[27:01.540 --> 27:11.780]  stuff. These are considered, in most cases, to be darknet search engines. Now, DuckDuckGo,
[27:11.780 --> 27:16.820]  we talked about, searches the clear net, but it's kind of working with Tor. We've got AMIA,
[27:16.820 --> 27:22.680]  which is searching Tor websites, but it's on the clear net. Then we've got stuff like Torch or
[27:22.680 --> 27:30.200]  NotEvil. If we wanted to go into these search engines and try to find more information on
[27:30.200 --> 27:35.240]  cyber threat intelligence, how would we start? Well, that's going to bring us to step six. Let's
[27:35.240 --> 27:45.060]  begin the hunt. This is NotEvil. FYI, this will probably get posted after, but if you try to go
[27:45.060 --> 27:52.040]  NotEvil, it may be down for an upgrade on August 7th. But otherwise, you can pretty much use this
[27:52.040 --> 27:58.500]  like Google. Put in some keywords and start searching. Now, I can tell you, but I will not
[27:58.500 --> 28:04.740]  show you, that when searching NotEvil for anything related to the pandemic, you get, surprise,
[28:04.740 --> 28:13.600]  surprise, a lot of pornography. Most searches will get you a lot of pornography. If we're being
[28:13.600 --> 28:20.160]  serious, that's just kind of how it is. You can tag videos and images and everything with anything,
[28:20.160 --> 28:26.540]  and it kind of works sometimes as a way of hiding actual posts and data, and sometimes it's just
[28:26.540 --> 28:33.680]  because people like to share it to each of their own. But as Pornhub stats can have shown in the
[28:33.680 --> 28:38.840]  past, when stuff is going on in day-to-day, you tend to see it turned into pornography and get
[28:38.840 --> 28:45.420]  really popular. So that's fine and dandy, but we're not going to show that here. Instead, what we're
[28:45.420 --> 28:52.340]  going to show is when I looked for hacker. Does stuff show up? Yes. We have some sites. We see some
[28:52.340 --> 29:01.440]  messages here. We see some posts down here. We have Vincent Canfield, a hacker on Keybase, who
[29:01.440 --> 29:09.600]  specifically says, do not chat messages on Keybase. Okay. We have The Social Hacker. We've got a warning
[29:09.600 --> 29:14.040]  on that site, check before purchasing. What are they selling? What's going on here? That's something
[29:14.040 --> 29:20.740]  we might want to look at. Now, what do we do if we search for cyber terrorists? Which is the next
[29:20.740 --> 29:28.640]  little block we have over here. Well, let's see. We've got Daniel's Onion List. That's cool. We've got
[29:28.640 --> 29:35.350]  some other languages that are popping up here. That's nice. Look, we have a mention of the CIA.
[29:35.880 --> 29:44.560]  Okay. We've got Rat Wires. All right. Interesting. So we get a few hits, but is a cyber terrorist
[29:44.560 --> 29:52.340]  group going to label themselves a cyber terrorist group? Maybe. Cyber terrorism is kind of a
[29:52.340 --> 29:58.000]  perspectives game. Most things are a perspective game, really, right? If I'm looking for
[29:59.000 --> 30:04.020]  information about someone that I think has done something bad, I think they've done something bad.
[30:04.020 --> 30:09.720]  They might think they're doing something good. Are they a hacktivist? How do we decide that?
[30:09.800 --> 30:13.780]  That's another thing we'll need to use some cyber threat intelligence skills and techniques to deal
[30:13.780 --> 30:23.780]  with. But let's get back to the next step. Part 7. Part of the talk. Websites on TOR can and are
[30:25.240 --> 30:30.720]  volatile. TORM may be up one day and down the next. Now, TORM is a great website for
[30:31.460 --> 30:37.680]  getting into the cyber security community on TORM. It's a very interesting forum.
[30:37.680 --> 30:45.020]  They got that traditional sweet, sweet terminal green on black going on, so it may or may not hurt
[30:45.020 --> 30:50.100]  your eyes. I would love to show you a picture of it, but when I went to get all the pictures,
[30:50.100 --> 30:56.540]  as per the slide mentioning, TORM was down. And that doesn't mean TORM is gone forever. Most
[30:56.540 --> 30:59.940]  likely it means it was down for maintenance or maybe they had a weather issue or some other
[30:59.940 --> 31:05.520]  issue wherever it's being hosted. It is what it is. But this can make communication difficult.
[31:05.520 --> 31:10.880]  And if you aren't embedded, you can easily lose track of things, if not be left behind completely.
[31:11.340 --> 31:17.760]  So there is a point at which we may want to talk about how do you safely integrate into
[31:17.760 --> 31:23.120]  the communities to get more information. You're already having a hard time searching the darknet.
[31:23.660 --> 31:28.220]  Is there a point at which you want to embed yourself in communities to start learning more?
[31:28.220 --> 31:33.800]  Is it safe to do? How do you do it safely? These are all really great questions that we don't
[31:33.800 --> 31:38.340]  really have time to get into today, but keep an eye out. This is definitely something I want to
[31:38.340 --> 31:46.000]  talk about with everyone more going forward. So summary slide, right? Cyberthreat intelligence
[31:46.000 --> 31:51.400]  on the darknet. All right, so on the clear net, in parts of the deep web, we can go through profiles,
[31:51.400 --> 31:57.820]  web pages, pretty quickly. What do we say? Like a minute to find a page on Google, give or take.
[31:57.900 --> 32:03.420]  About five, up to five minutes for the deep web if you're just doing a shallow dive. Not too bad.
[32:03.800 --> 32:09.020]  Darknet's different though, huh? Although there are some tools in place, such as darknet search
[32:09.020 --> 32:14.360]  engines, like not evil, we don't have the same boosters here as we do everywhere else. It's not
[32:14.360 --> 32:20.200]  quite as easy. It's not quite as indexed. And this is something we take advantage of in our
[32:20.200 --> 32:26.220]  day-to-day lives. A lot of us may even forget a time or not have been alive during a time before
[32:26.220 --> 32:32.940]  Google, before these things were indexed and easy to find. So if we're trying to do darknet research,
[32:32.940 --> 32:38.320]  there's a huge bottleneck here. If you're doing manual investigations, that's a slow process. And
[32:38.320 --> 32:43.860]  getting to websites and finding websites, it's the game of the darknet. You got to trade stuff,
[32:43.860 --> 32:49.960]  right? You've got to interact to really dig in and find these. It's a huge hurdle when trying
[32:49.960 --> 32:55.660]  to hunt for cyber threat intelligence on the darknet. And this is where Ambly comes in. And
[32:55.660 --> 33:00.020]  I'm really excited that we've gotten to this part because this is a project that I have been hoping
[33:00.020 --> 33:04.880]  to work on for a long time. And when I finally got the chance to do it with the SAIL lab,
[33:04.880 --> 33:10.960]  I think I just about lost my mind. I was so excited. So taking a step back, let's talk a
[33:10.960 --> 33:16.660]  little bit about Ambly. Ambly is a smart spider focused on the darknet to gather and identify
[33:16.660 --> 33:23.600]  cyber threat intelligence. Ambly is going to be using some machine learning and
[33:23.600 --> 33:28.760]  artificial intelligence techniques to actually identify relevant websites and
[33:29.760 --> 33:33.940]  to make sure that they're valid and they're pertaining to cyber threat intelligence.
[33:34.380 --> 33:38.940]  We're going to be accessing websites hopefully down the line behind capture blocks or user
[33:38.940 --> 33:44.580]  accounts and anti-robot protocols. The goal here is to use deep learning and natural language
[33:44.580 --> 33:50.640]  processing to identify and rank new URLs based on the potential for CTI. We're going to hopefully
[33:50.640 --> 33:55.840]  have a report that gives us out webpages and identifies further investigation recommendations
[33:56.400 --> 34:02.300]  so that an analyst, government, researchers, whatever, what have you, can actually use this
[34:02.300 --> 34:07.420]  to get rid of or to reduce that bottleneck that we talked about a moment ago.
[34:08.180 --> 34:14.740]  Reducing the bottleneck is the key behind Ambly's mission here. We want to make this easier and
[34:14.740 --> 34:19.920]  better and not just for cyber threat intelligence. There's other tools that may or may not be in the
[34:19.920 --> 34:26.100]  works here, but this is the goal of Ambly. Ambly is for cyber threat intelligence to identify
[34:26.100 --> 34:32.400]  and track this information on the dark net. Right now, specifically Tor, but that does not mean it's
[34:32.400 --> 34:39.700]  limited. But let's take a step back. That's what the Ambly we want. We want to be able to do a lot
[34:39.700 --> 34:46.700]  of stuff with Ambly. But where are we right now? Well, Ambly is in a prototype stage and we'll get
[34:46.700 --> 34:52.860]  to that in a minute. But right now, Ambly is going out and gathering data. Ambly is actually creating
[34:53.120 --> 35:01.080]  a database on viable dark net websites, specifically on Tor, so they're all .onion sites.
[35:01.260 --> 35:08.400]  So how does Ambly do that? Imagine, if you will, another scenario. You dropped in a random city.
[35:08.640 --> 35:15.320]  You don't know where you are. You don't know what country, state, city, anything. You don't have a
[35:15.320 --> 35:20.500]  phone, you don't have a computer, and you don't know the language of the people around you.
[35:21.080 --> 35:33.680]  What do you do? How do you find safety? How do you get home? Well, personally, I would start walking,
[35:33.680 --> 35:41.080]  searching for any sign of a police station, of some trusted resource that I can go to for help.
[35:42.160 --> 35:48.180]  Now, maybe there's this universal sign that we'll see, or some sign that'll help me indicate that
[35:48.180 --> 35:52.960]  path. Maybe there's not, and maybe I just have to walk. And I go down a path, and I keep following
[35:52.960 --> 35:59.000]  that in any way it leads, and if it doesn't work, I go back. And we keep doing that almost recursive
[35:59.000 --> 36:05.200]  spin around until we find an area that we need to be in. This is kind of what Ambly is doing,
[36:05.200 --> 36:12.300]  to create that initial data set on Darknet websites. It's just going. It's free to move.
[36:12.300 --> 36:20.720]  It's collecting websites. It's identifying and collecting a whole database on Darknet websites,
[36:20.720 --> 36:25.760]  when they were viable, verifying that we could actually go to the site, not only that the link
[36:25.760 --> 36:32.520]  was out there. And from there, we're going to use that data to start doing really cool things with
[36:32.520 --> 36:40.400]  machine learning. But let's talk about Ambly of today. All right, so we're in a prototype stage,
[36:40.400 --> 36:46.640]  like I mentioned. We're beginning to be on active and frequently updated Tor wikis. Not the one I
[36:46.640 --> 36:52.040]  showed you earlier, but a different one. One that's really up to date and monitored frequently.
[36:52.300 --> 36:57.620]  We've got two access points that Ambly has. One, instantly on the Darknet. Anything that goes
[36:57.620 --> 37:02.300]  through any search, any website, we're going to the Darknet connection point. This makes sure
[37:02.300 --> 37:08.360]  that we are always connected to one of the nodes and it keeps us in the network. So even if, by
[37:08.360 --> 37:14.300]  chance, and early on we ran into this, where maybe we found a Clearnet website, going to that site
[37:14.300 --> 37:21.140]  still used the Darknet IP addresses. We've now since removed the ability to go to Clearnet sites,
[37:21.140 --> 37:28.920]  as the focus here is on Darknet only or .onion sites. The second connection is to the
[37:28.920 --> 37:34.640]  Clearnet for the actual database storage, which currently is a MongoDB Atlas format.
[37:35.140 --> 37:42.000]  We'll talk about that in a minute. Now the crawls gather HTML, which includes URLs and text. The
[37:42.000 --> 37:47.220]  URLs are parsed out and stored separately. The text is parsed out and stored separately as well.
[37:47.220 --> 37:54.320]  And then the HTML is stored as is, but in a binary format. At no point, and this is very important,
[37:54.320 --> 38:00.120]  at no point are images specifically pulled. Illicit materials are avoided and are not meant
[38:00.120 --> 38:06.720]  to be interacted with, including and especially CSAM, which is a term that's used frequently
[38:07.260 --> 38:12.600]  for people who are working on anti-human trafficking and child pornography rings.
[38:12.600 --> 38:21.100]  This is specifically about sensitive materials regarding children that we do not want out there.
[38:21.780 --> 38:27.080]  We are not collecting that with Ambly. You may see on the HTML that an image was there,
[38:27.080 --> 38:33.000]  but the image itself is not pulled. Now the database, like I mentioned, is a MongoDB Atlas.
[38:33.240 --> 38:38.420]  It is file sizes identified at the pull and marked so that we don't go over the limit.
[38:38.420 --> 38:43.540]  We did go over that. We'll see that in the next slide. The format of the data is important for
[38:43.540 --> 38:50.100]  avoiding those document limits. Again, we'll talk about that. And Ambly runs in a digital ocean
[38:50.100 --> 38:59.780]  droplet. All right, Ambly's first 12 hours. So, first 12 hours that we ran Ambly Prototype
[38:59.780 --> 39:06.920]  Stage 1, we had MongoDB Atlas and local. We found they both have a limited size of 16 megabytes per
[39:06.920 --> 39:15.220]  document. And this is what broke our cycle. The reason for that is originally the text was in an
[39:15.220 --> 39:21.760]  unbounded array format, which just extended too large for certain websites to actually be
[39:21.760 --> 39:28.380]  collected. So, what you'll see here is the first 12 hours of Ambly. We collected, in 12 hours,
[39:28.380 --> 39:43.740]  86,546 websites. We got 1,819, rather, HTML pulled from those sites. And the text we got to 1,818.
[39:43.740 --> 39:51.420]  On the 19th is when we got the limit size was hit. That has since been rectified. But just
[39:51.420 --> 39:58.660]  imagine this, the first 12 hours, we got so many URLs. People talk about all the time
[39:59.260 --> 40:06.660]  that the dark net is small. And it is compared to the clear net, compared to the deep net, sure,
[40:06.660 --> 40:12.740]  but it is not tiny. There are so many URLs and websites out there. And this is specifically
[40:12.740 --> 40:22.780]  specifically designed to make sure that no two websites are pulled with the same URL. So, even if
[40:22.780 --> 40:28.280]  two sites link to the same website or webpage, only one of them, the first one, is stored. The
[40:28.280 --> 40:39.480]  other is dropped. That is 86,546 unique URLs pulled in the first 12 hours. I don't know about you,
[40:39.480 --> 40:45.620]  but I was ecstatic to see that. So, let's get into the good stuff. Let's talk about some
[40:45.620 --> 40:51.680]  initial points of interest. We've got a few pulls that I went in and I grabbed some interesting
[40:51.680 --> 40:59.180]  sites from. The fourth website to be pulled after going to the Tor Wiki was the Liberated Books and
[40:59.180 --> 41:06.200]  Papers site, which claims to be a collection of books not easy to come by. Our sixth website
[41:06.200 --> 41:12.760]  crawled was an error page for WordPress, which actually helped show that WordPress sites are
[41:12.760 --> 41:17.780]  being used on the Darknet under the name of Torpress, which is really interesting. You can
[41:17.780 --> 41:26.400]  actually see that in the URL. Thirty-two, we found the TMG Mirror List, also known as the Majestic
[41:26.400 --> 41:34.460]  Garden. This has a PGP for joining the Majestic Garden group. And there were multiple, not just
[41:34.460 --> 41:40.280]  these three, there were even more of these URLs that had the exact same thing on them, which brought
[41:40.280 --> 41:46.140]  up a very interesting question. What do we do about sites that are so similar? If they all have the
[41:46.140 --> 41:52.360]  same stuff, but they're just a different URL, do we count them as one entity or combine them?
[41:52.680 --> 41:59.160]  We'll see. All right. Sixty-first website crawled, Relate List, new area of intelligence. Really
[41:59.160 --> 42:04.480]  interesting for looking up information on companies. I'm going to skip ahead a little
[42:04.480 --> 42:09.440]  bit because I'm seeing we're short on time. So let's go into this. Initial points of contact.
[42:09.440 --> 42:16.760]  This is how we see the URLs. We see that the URLs are all stored, we see if it's visited or not,
[42:16.760 --> 42:23.040]  and this is the binary format for the HTML. We crawled a lot of sites, the Wiki, Bitcoin wallets,
[42:24.020 --> 42:29.700]  marketplaces, resellers, you name it, we saw it. It was great. And we still have all of this data
[42:29.700 --> 42:35.320]  to go through. A few other interesting ones is the black and white cards, which is a mysterious
[42:35.320 --> 42:42.900]  group open to members. This one was really interesting just to read through. We also have
[42:42.900 --> 42:46.860]  Rent-A-Hacker, which I don't know if anybody else has noticed this, but there's about five,
[42:46.860 --> 42:51.340]  at minimum, versions of this website under different names and different URLs that say
[42:51.340 --> 42:58.440]  pretty much the exact same thing. So this is pretty much a mirror. Very minor, if any, edits.
[42:58.700 --> 43:04.760]  But if you need a hacker, I wouldn't go to this one. Again, you're at DEF CON, so really,
[43:04.760 --> 43:12.080]  you need to go to them. Now, Ambley of tomorrow. As a prototype, Ambley is on a road of continuous
[43:12.080 --> 43:18.140]  change. There's a few areas that we want to get down on Ambley's development. One,
[43:18.140 --> 43:23.200]  integrate a new classifier, PermID, currently working on it, that will help Ambley's crawl
[43:23.200 --> 43:29.040]  to classify webpages during the initial crawl itself, telling us is this relevant to
[43:29.720 --> 43:36.220]  cyber threat intelligence, is it relevant to cybersecurity, drugs, bitcoins, you tell me.
[43:36.800 --> 43:41.360]  Now, then we want to be able to identify webpages with CAPTCHA, which we've found a few,
[43:41.360 --> 43:46.060]  test the CAPTCHA breaker component that I'm also working on. This includes specialized CAPTCHA
[43:46.060 --> 43:54.260]  found on TOR sites, like this gorgeous DREAD CAPTCHA. DREAD's CAPTCHA methods are both magic
[43:55.260 --> 44:03.800]  beauties, they're beautiful, I can't speak to them, but they're also horrendous to actually
[44:03.800 --> 44:10.680]  deal with as a person. This one's nicer than the old one. Finally, we want to implement a
[44:10.680 --> 44:16.000]  deep learning module using natural language processing to identify relevant links to
[44:16.000 --> 44:21.760]  prioritize. This is really great. I'm really excited about getting that part up and running,
[44:21.760 --> 44:28.620]  and that's really the big next stage. But that's all I have time for today. I'm running out of,
[44:28.620 --> 44:32.760]  I'm pretty much at my limit. So, unfortunately, I'm sorry I had to run a little fast at the end,
[44:32.760 --> 44:37.740]  but thank you so much, DEF CON, and the recon village, and everyone who's came in today to
[44:37.740 --> 44:43.240]  watch this. Thank you so much for spending some time with me. I also want to say I'm thankful for
[44:43.240 --> 44:48.680]  the Sailors of the Secure and Assured Intelligence Learning Lab that I work with, and the KDD team
[44:48.680 --> 44:54.900]  of Kansas State University. Finally, I volunteer with the Innocent Lives Foundation, and I really
[44:54.900 --> 44:59.560]  wanted to just shout them out here at the last few seconds. I really hope you give them a look,
[44:59.560 --> 45:05.680]  they're great. And part of why I'm so passionate about this is because I get to work with them,
[45:05.680 --> 45:11.340]  not specifically on this project, but on others. I really hope anybody who wants to reach out does,
[45:11.340 --> 45:16.700]  my social media is on this video, and let's chat, let's talk some Tor.
[45:16.960 --> 45:20.620]  Thanks again, everybody. Have a great rest of DEF CON.
